Right, so let's start by importing some relevant libraries and the datasets from the previous section.

Quick reminder:

For this part,

several graphs will be created. Data scientists are good at managing data and creating all sorts of weird stuff, but all these outputs might look confusing for outsiders. Therefore, Streamlit might come in handy, since it is a powerful tool for dashboard creation.

Considering we have 3 groups of different data, it becomes too much time-consuming creating separate functions for each group. We can make things a lot easier by creating general functions that can take any input and return the appropriate output. This will also simplify our Streamlit code, making it faster to develop and easier to cache, if needed be. "By default", let's use the datasets from Group 1 as the basis for our visualizations and analyses.

Let's get things going:

Uh-oh, looks like our data is messy. Invoices before 12/Dec/2009 were all clumped together into the 12th day of the month, therefore, we don't have available data on a daily basis.

Let's see how this would look like if we were to regroup everything into monthly data.

Weird, still. To be honest, we should just drop all these data from Nov/2009 backwards.

Also, Dec/2010 seems incomplete and we should just drop it if that is the casse. Let's check.

Indeed, data from Dec/2009 starts on the 13th...

... and Dec/2010 only had purchases up to the 11th day...

Let's also remove both months.

Ok, columns Quantity, Price and InvoiceDate are quite irrelevant in this dataframe.

The index is wrongly named, so we need to fix that as well.

Let's get that into a variable and plot it.

Psst! We can do the same thing for the unique number of invoices!